Journal of Bacteriology, Apt. 1999, p. 235S- 23r>2 
()021-9193/99/$04.00 + n 

Copyright © 1999, American Society for Microbiology. All Rights Reserved. 



Genetic Diversity in the Protective Antigen 
Gene of Bacillus anthracis 

LANCE B. PRICE, 1 MARTIN HUGH-JONES, 2 PAUL J. JACKSON, 3 and PAUL KE1M 1 * 
Department of Biological Science, Northern Arizona University, Flagstaff, Arizona 86011-5640^; Department of 



Epidemiology and Community Health, School of Veterinary Medicine, Louisiana State University, 
Baton Rouge, Louisiana 70803 2 ; and Environmental Molecular Biology Group, Los Alamos 
National Laboratory, Las Alamos, New Mexico 87545^ 

Received 2 December 1998/ Accepted 29 January 1999 

Bacillus anthracis is a gram-positive spore-forming bacterium that causes the disease anthrax. The anthrax 
toxin contains three components, including the protective antigen (PA), which binds to eucaryotic cell surface 
receptors and mediates the transport of toxins into the cell. In this study, the entire 2,294-nucleotide protective 
antigen gene {pag) was sequenced from 26 of the most diverse 5. anthracis strains to identify potential variation 
in the toxin and to further our understanding of B. anthracis evolution. Five point mutations, three synonymous 
and two missense, were identified. These differences correspond to six different haploid types, which translate 
into three different amino acid sequences. The two amino acid changes were shown to be located in an area near 
a highly antigenic region critical to lethal factor binding. Nested primers were used to amplify and sequence 
this same region of pag from necropsy samples taken from victims of the 1979 Sverdlovsk incident. This 
investigation uncovered five different alleles among the strains present in the tissues, including two not seen 
in the 26-sample survey. One of these two alleles included a novel missense mutation, again located just 
adjacent to the highly antigenic region. Phylogenetic (ctadistic) analysis of t he pag corresponded with previous 
strain grouping based on chromosomal variation, suggesting that plasmid evolution in B. anthracis has 
occurred with little or no horizontal transfer between the different strains. 



Bacillus anthracis is the causative organism of the potentially 
fatal disease anthrax. Virulent forms of B. anthracis carry two 
large plasmids, pXOl (ca. 174 kb) and pX02 (ca. 95 kb). Vir- 
ulence factors include toxin and capsule production, encoded 
on pXOl and pX02, respectively. The anthrax toxin is com- 
posed of three proteinaceous subunits: (i) lethal factor (LF), 
the toxin component thought to kill host cells by disrupting the 
rnitogen-activated protein kinase pathway (2); (ii) edema fac- 
tor (EF), an adenylyl cyclase that causes skin edema in the 
infected host (6); and (iii) protective antigen (PA), which binds 
to eucaryotic cell surface proteins, forms homoheptamers, and 
then binds to and internalizes EF and LF. 

The structure and function of PA have been well described. 
The entire PA gene (pag) sequence has been published and is 
available in GenBank (accession no. M22589) (12). The three- 
dimensional structure has also been solved and is available in 
the NCBI Entrez 3D database (MMDB no. 6980) (10). Finally, 
antibody-binding experiments have been used to define regions 
of the PA protein critical to cell surface attachment as well as 
LF binding (8). Missing from the literature until now was a 
population study of pag from diverse strains of B. anthracis to 
define the natural variation in this important gene. 

In past studies, plasmid-specific genetic v ariation in B. an- 
thracis has been largely ignored. A recent population study, 
based on chromosomal markers, demonstrated that B, anthra- 
cis is one of the most monomorphic bacterial species known 
(5). This chromosomal amplified fragment length polymor- 
phism study examined ca. 6.3% of the B. anthracis genome for 
length variations and ca. 0.36% for point mutations. However, 
due to ambiguities arising from the absence of one or both of 
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the plasmids, plasmidal data were omitted from the final re- 
sults. Studies of pXOl diversity and especially of pag are es- 
sential to understanding evolution of pathogenesis in B, an- 
thracis. Likewise, comparative studies of plasmid-based versus 
chromosomal variation can provide insight into the frequency 
of horizontal plasmid transfer in natural B. anthracis popula- 
tions. 

In this study we have sequenced the entire pag gene from 26 
of the most diverse strains of B. anthracis (5). These sequences 
were aligned and analyzed for point mutations then studied 
phylogenetically to determine if the pag data are consistent 
with chromosomal diversity groups. Additionally, we se- 
quenced a 307-bp variable region of pag from 10 Sverdlovsk 
anthrax victim necropsy samples (4) in order to identify novel 
pag sequences. 

MATERIALS AND METHODS 

B. anthracis DNA. Culture conditions. DNA isolation methods, and diversin 
groups are described in reference 5. Necropsy tissue DNA was isolated as 
described h> Jackson el at. (4) 

PCR amplification of DNA, Tahlr 1 obtains die sequences for all P' imtr _' 
used for this project These were designed from the published pag sequent 
<GenBank accession no. M22589) and synthesized by Gibco/BRL, Belhesda.Mi 
All primer positions cited throughout this report are based on this Gentian 
sequence Two DNA fr3gm<rHS,"k'S<:thcr totaling 2,531 bp uf sequence. «~ 
initially ampulla d t i i equencing template from tbe 26 B. J"''" 11 "' 

strains. PA- IF and PA IR were used t imphfy a 1.191-t in nl i 1 
the 5' portion of PA. Thus included 131 bp of upstream flanking sequence. f\- 
and PA-2R were used to amplify a 1,449 bp fragment containing the 3' pnt"" 
of PA. This included 10b hp of downstream Ranking sequence. The two i"S 
merits contained 11)9 bp of overlapping sequence near the middle of the E' 1 " 
Fifty-microliter PCR mixtures contained 1 x PCR buffer (20 mM Tris !pH »■ 
mM KCI, Git URL 0.10 nM d tvnucl ;id tripl ospb.. 1 ' 
MgCU, -0.2 ng of template DNA per pi, 0.04 U of Taq DNA poly™ £ " { 
(Gibco/BRL) per pi. and ' 4 u.M f rward and reverse primer adj'J 
wilh filtered (0.2-u.m-pOre-size filter) 17.8 mohm E-pure water. Reactions *^ 
heated to *fC for 5 min and then subjected to 35 cyclt wh nro 1 „, 
at 94°C, JO s at 62*0, ar.d 1.5 rain at ?2*C. This was'foltowed by heaitng f ■ 
,-l nr 5 mm JUiJuroriei£jjim£L pension. PCR [ products wercj>urifiecl into ". 
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TABLE 1. Primers used in this study 



jp Amp/Seq ATA TTT ATA AAA GTT CTG TTT AAA AAG CC 5' -1673 . 

p*' 1R Amp/Seq TAA ATC CTG CAG ATA CAC TCC CAC 3'-2840 - 

p.'jF Amp/Seq ATA AGT AAA AAT ACT TCT ACA AGT AGG ACA C 5 '-2755 

p,~1R Amp/Seq GAT TTA GAT TAC TGT TTA AAA CAT ACT CTC C 3'-4173 

p A _5 Seq TCA TGT AAC AAT GTG GGT AGA TGA C 5 '-2 145 

pA 4 Seq CTC TAT GAG CCT CCT TAA CTA CTG AC .V-3717 

pA . 5F Amp ATC CTA GTG ATC CAT TAG AAA CGA C 5 '-34 16 

pA , 5R Amp CTT CTC TAT GAG CCT CCT TAA CTA CTG 3'-3719 

pA . 5 F ncs| Amp/Seq AGT GAT CCA TTA GAA ACG AC S'-3421 

PA-5R""t Amp/Seq TAA CTA CTG ACT CAT CCG C 3'-3709 

" Amp, used for amplification; Seq, used for sequencing; Amp/Set), used for both amplification and sequencing. 

, I rrcS | nil n't ml i t "mom M2 s nudcmiJ position 
■ NA, not applicable. 



Qiaquick purification minicolumns (Qiagcn Inc., Valencia, Calif.) and then 
quantified on ethidium bromide-stained 1.25% agarose-Tris-acetate-liD 1'A gels. 
These putified fragments were then used in subsequent sequencing reactions. 
PCR amplification of necropsy sample DNA was performed as described by 
jackson et al. (4), using primers PA-5F, PA-5R, PA-5F„ S1 , and PA-5R„ es , (Table 

' DNA sequencing. PCR products were sequenced on an ABI model 377 fluo- 
rescence sequencer using a PRISM Ready Reaction I3igDye terminator cycle 
sequencing kit (both from Perkin-Elmer/Applied Biosystems Inc., Foster City, 
Calif ). Sequences were aligned and analyzed with Sequence Navigator software 
(I't i n Ebnc vppl c 1 Bi« s> ti m i 

Cladistic analysis. Cladisiic analysis was performed on the pag sequences by 
using maximum parsimony with PAUP 3.1,1 software (developed by David L. 
iuoSord 111 i N itur il History Survey) and manual examinations of sequence 
polymorphisms, 

Three-dimensional analysis, [lie PA structure has been solved and is available 
on the NCBI Ertttc/. 3D database (MMDIi no. 6980) (10). Amino acid residues 
shown to vary among strains were identified on the three-dimensional structure, 
and then physical distances from the putative LF binding region of PA domains 



3 and 4 were estimated by using MAGE 4.5 software (developed by David 
Richardson, Biochemistry Department, Duke University, Durham. N.C.). 



RESULTS 

Sequence alignment of the entire PA gene from 26 strains 
representative of the five B, anthracis diversity groups (5) (Ta- 
ble 2) revealed five point mutations, three synonymous and two 
missense, shown in Table 3. All five mutations are transitions. 
Two of the synonymous mutations occur only once. However, 
the other differences are present with frequencies ranging from 
3/26 to 20/26. The two missense mutations are located adjacent 
to a highly antigenic region crossing the junction between PA 
domains 3 and 4 shown to be critical to LF binding (Fig, 1) (8, 
10). The different mutational combinations observed in this 



Strain ^eScT Diversity group" PA genotype' PA phenotype c 



BA0052 


Jamaica 


Sterne-Ames 


I 


FPA 


BA1087 


Scotland 


Sterne-Ames 


I 


FPA 


J611 


Indonesia 


Sterne-Ames 


I 


FPA 


BA1031 


South Africa 


Sterne-Ames 


I 


FPA 


BA1043 


South Africa 


Si erne-Ames 


I 


FPA 


28 


Ohio 


Sterne-Ames 


11 


FPA 


MOZ-3 


Mozambique 


Southern Africa 


III 


FPA 


BA1035 


South Africa 


Southern Africa 


III 


FPA 


33 


South Africa 


Southern Africa 


IV 


FPA 


A24 


Slovakia 


Southern Africa 


V 


FPV 


K20 


South Africa (Kruger) 


Kruger 


V 


FPV 


26/05/94 


Zambia 


Kruger 


V 


FPV 


BA1033 


South Africa 


WNA 


V 


FPV 


BA1017 


Haiti 


WNA 


V 


FPV 


BA1015 


Maryland 


WNA 


V 


FPV 


93-194C 


Canada 


WNA 


V 


FPV 


M-195C-8 




WNA 


V 


FPV 


BA1040 


Colorado 


WRf\ 


V 


FPV 


BA1007 




WNA 


V 


FPV 


Pak-2 


Turkey 


WNA 


V 


FPV 


Pakistan 


WNK 


V 


FPV 


STl-l 


Russian vaccine strain 


WN>\ 


V 


FPV 


F-t 


South Korea 


Vollum 


V 


FPV 


BA1024 


Ireland 


Volium 


VI 


FSV 


ASC-3 


United Kingdom 


Vollum 


VI 


FSV 


BA10O9 


Pakistan 


Volium 


VI 


FSV 



° Diversity designations are consistent with those described by Keim et al. (5). 
t Described in Table 4. 

Designated by the single-letter designations of the three amino acids shown k> vary in this study. 
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FIG, 1. Model of /rag fron.. 
variable regions from the Sver 



is. S, region of ttene >hal codv 
n- s.nnples; |i!;ick vcitt^a! a: 



region important to LF binding (8, 10). Dom., domain. 



r cleaved signal region; NP-F and NPjR, forward ami reverse nested primers used to amplify 
s, missense mutations, grey vertical Irrows, synonymous mutations; HAR, highly antigenic 



study give rise to six PA genotypes and three PA phenotypes 
(Table 4). 

Cladistic analysis of the 2b pag sequences was performed by 
the maximum parsimony method to produce a gene tree (Fig. 
2A). The 26 strains grouped into four clades of 3, 3, 6, and 14 
individuals. These groups were defined by three synapomor- 
phic (informative) differences. In addition, we identified two 
apomorphic (uninformative) nucleotide differences (mutations 
2 and 6) that separated two strains (28 and 33) from others in 
their clades. These mutations are identified on the respective 
branches but were not used to isolate these strains from their 
groups. The clades and topology identified by this tree were 
mostly congruent with those generated from chromosomal 
markers (Fig. 2B) (5). The only aberrations are the following, 
(i) Chromosomal data from strain A24 indicate that it is of the 
Southern Africa lineage (5), but the pag data place this strain 
with the Western North America (WNA) diversity group (one 
mutational step away); (ii) chromosomal data from strain F-l 
indicate that it is of the Vollum lineage, but the pag data place 
this strain with the WNA diversity group (again, one muta- 
tional step away), and (in) chromosomal markers indicate that 
the Kruger samples, although very similar, are genetically dis- 
tinct from the WNA lineage. However, the pag gene tree did 
not resolve these two distinct groups. It should be noted that 
chromosomal markers indicate that Vollum and WNA are 
sister groups and, likewise, that Kruger and WNA are closely 
related. Only with strain A24 do the pag data suggest that 
strains from two distantly related groups (based on chromo- 
somal markers) are closely related. 

To determine the pag genotypes and phenotypes of the 
strain(s) involved in the Sverdlovsk incident, nested PCR prim- 
ers (Table 1) were designed to amplify and sequence a 307-bp 
region of pag. l'his region spans the junction between PA 
domains 3 and 4 where much of the variation was observed. 
This analysis uncovered two additional transition mutations (3 
and 7 in Table 3). One was synonymous, while the other was a 



TABLE 3. Mutations idemmed m this study 



;e change Freuueney 



2883 
3481 
3496 
3602 
3606 
3672 



COT 
G»T 
ToC 
AoG 



Synonymous 

Svnonvmous 

FOL' 

POS 

A«V 

Synonymous 
Synonymous 



" Nucleotide positions are based on the 4.235-hp pXOl sequence from Sterne 
suain, accession no M22>S'>, ronr.ii.iing rw in its cniiieiv 



novel missense mutation resulting in a phenylalanine«-^Ieucine 
change. These chapges resulted in two additional genotypes 
and one new phencjtype (Table 4). The amino acid change was, 
again, immediately! adjacent to the highly antigenic region of 
PA domains 3 andj 4 (Fig. 1). Repetitive sequencing of these 
tissues uncovered inultiple PA genotypes within some of the 
individual necropsy samples. Together, five different PA geno- 
types were observed in the Sverdlovsk samples, with some 
samples showing evidence of infection by multiple strains (Ta- 
ble 5). This finding is consistent with the results of Jackson et 
al. (4). 

Figure 3 is an ui rooted phylogenetic tree demonstrating the 
five mutational ste as leading to the six PA genotypes and three 
PA phenotypes identified in this study. Additionally, the puta- 
tive positions of the Sver jlovsk samples are shown. However, 
because the Sverdlovsk identifications were based on just the 
307-bp region arownd the antigenic portion of PA domains 3 
and 4, these placements are only tentative. 

Three-dimensional analysis of all the amino acid changes 
observed in this study (mutations 3, 4, and 5 in Table 3) 
indicated that the^e changes are not only close sequentially but 
also very close in three-dimensional space to the antigenic 
region important 'for LF binding. Mutation 3 (Phe to Leu), is 
ca. 11.2 A, mutation 4 (Pro to Ser) is ca. 20.3 A, and mutation 
5 (Ala to Val) is ! ca. 19.0 A from the central portion of this 



Sterne-Ames 



- 93-195C-8 

- B A 1040 

- BA1007 



Western North 
America 



FIG. 2. Cladistiq analysis of the 26 diverse strains. (A) Unreel I " 1 (l 
parsimony gene tree based on peg data developed in this study; (Bl 
diversity groups baspd on chromosomal AFLP data described by Hewn <« al 
Br anch mutations are numbered as described for Table 3. 
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TABLE 4. PA genotypes and phcnotypfis identified in this study" 



PA phenotypc" 


frequency 










5 






FPA 


5/26 


C 


G : 


T(F) 


C(P) 


C(A) 


T 


A 


FPA 


1/26 


C 


G ! 


T(F) 


C(P) 


C(A) 


C 


A 


FPA 


2/26 


T 


g : 


T(F) 


C(P) 


C(A) 


T 


A 


FPA 


1/26 


T 


A i 


T(F) 


C(P) 


C(A) 


T 


A 


FPV 


14/26 


T 


g ; 


T(F) 


C(P.) 


T(V) 


T 


A 


FSV 


3/26 


T 


G ! 


T(F) 


T(S) 


T(V) 


T 


A 


LP A 


NA 






C(L) 


OP) 


C(A) 


T 


A 


FPA 


NA 






T(F) 


C(P) 


C(A) 


T 


G 



only in the Sverdlovsk samples); — , the regiojn w 
of the three amino acids shown to vary 



:t unaly/.ce tor the Sver<i;oi$k sample 



region. These spatial distances were estimated solely on pep- 
tide backbone-to-peptide backbone relationships. However, 
when the three-dimensional spaces occupied by the side chains 
of the amino acids were considered, changes were found to 
a j ec t residues as close as 6.9 A from the central amino acids of 
this critical antigenic region. 

DISCUSSION 

The protective antigen protein is central to the virulence 
associated with anthrax toxin. Elucidation of PA variation and 
its encoding gene could lead to a better understanding of B. 
anihrocis virulence and evolution. Until now, pag had been 
sequenced in its entirety only from a single B. anthracis strain 
(12). In this study, a detailed analysis of the entire pag se- 
quence (2,294 bp) from 26 diverse B. anthracis strains revealed 
only five point mutations, corroborating the high degree of 
genetic monymorphism found by Keim et al. (5). 

Among these mutations, there is a disproportionate number 
of missense (two) to synonymous (three) changes. A common 
ratio of missense to synonymous mutations is approximately 
1:5; here we see a ratio more than threefold greater (7). These 
missense mutations are located near a highly antigenic region, 
critical to LF binding. In monoclonal antibody studies, Little et 
al. demonstrated that by blocking an epitope between amino 
acids Ile-581 and Asn-601 (Fig. 1), they could effectively block 
LF binding to PA (8). Three-dimensional analysis indicated 
that the missense mutations identified in our study are very 



TABLE 5. Tissue samples from Sverdlovsk vt 
this study" 



Sample 



Tissue 



genotype(s) phenotype(s) 



7.RA93.15.15 

40.RA93.40.5 

".RA93.30.3 

37.RA93.35.4 

17.RA93.35.6 

3.RA93.1.1 

25.RA93.03l 

I-RA93.42.1 

MRA93.20.5 

'1RA93.38.4 



Spleen 
Spleen 
Spleen 

Vaccination sit 

Meninges 
Meninges 
Meninges 
Meninges 
Lymph node 



FPV 

FSV, LPA 
FPV 

FPA, FPV 

FPA 

FPA 

FPV 

FPV 

FPV 

FPV 



° Detert 



close in three-dimensional space to this antigenic region. 
Wljiile none of the three missense mutations were dramatic, 
sutjh as a change from an extremely hydrophobic to a hydro- 
phflic amino acid, the proline-to-serine change has the poten- 
tial to make important three-dimensional alterations, since 
proline isomerization is known to play a critical role in protein 
folding. Because of their close proximity, these amino acid 
changes have the potential to effect LF binding, either directly 
or indirectly, within an infected host. The grouping of these 
mjssense mutations near this antigenic region and the dispro- 
portionate number of missense to synonymous mutations sug- 
gests adaptive variation. One of the two new mutations iden- 
tified in the Sverdlovsk victims' tissues was found to be a novel 
missense mutation located, sequentially and three dimension- 
ally, near the highly antigenic icgion of the junction between 
PA domains 3 and 4. When these mutations are included with 
triose identified in the 26-sample survey, the ratio of missense 
tq synonymous mutation is increased to 3.8:5. 

I The amplification and sequencing of the 307-bp pag frag- 
ment from the Sverdlovsk tissue samples suggested that at least 
five different strains of B. anthracis were present in the samples 

id that some of the individual victims had been infected with 
n ultiple strains. These data corroborate earlier work with the 

rA locus that suggested that multiple strains of anthrax had 
been released during the 1979 incident (1, 4). Besides the 

.ussian vaccine strain STI-1, included in this study, these tis- 
sue samples are a rare glimpse at the different strains of B. 
aHthracis that are thought to be endemic in the vast region of 
t)ie former Soviet Union. The fact that two previously unob- 
served mutations were found in the Sverdlovsk samples 
stresses the importance of collecting and analyzing B. anthracis 




in may be type I, II, or IV but was 



d phylogenetic tree of PA genotypes. Open boxes show the 
three PA phenotypes identified; shaded boxes show the possible positions of the 
Sverdlovsk genotypes, VU s „ d and VII[ Svd . Synonymous mutations are shown in 
open circles, and missense mutations are shown in closed circles. Each mutation 
is described in Table 3 and the phenotypes are described in Table 4. 
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strains from areas where anthrax is endemic but largely un- 
characterized by molecular genetic analysis. 

Independent cladistic analysis of pXOl by using the pag 
sequence has enabled us to estimate the likelihood of horizon- 
tal transfer of this plasmid between different B. anthracis 
strains in natural populations. Although horizontal transfer in 
Bacillus spp. is possible under laboratory conditions, the sim- 
ilarity of the cladistic grouping from the pag data to that of the 
chromosomal markers suggests that the differences in pag 
arose from evolution within particular strain lineages and were 
not a result of horizontal pXOl transfer. The single possible 
exception is associated with the A24 sample, which chromo- 
somally is related to the Southern Africa strains, while the pag 
data for this strain are consistent with Kruger-WNA. This is 
either a result of convergent evolution or evidence of horizon- 
tal pXOl transfer. Further, it should be noted that the data 
presented in this report do not rule out the potential for hor- 
izontal transfer of plasmid pXOl between closely related 
strains within an infected host. 

The unrooted phylogenetic tree (Fig. 3) is a useful tool for 
demonstrating the relationships between the different PA ge- 
notypes. However, it is not meant to infer an evolution toward 
a particular form of PA. Although distant homologues from 
other gram-positive bacteria are cited (3, 9, 11), none of these 
is close enough io root a B. anthracis PA phylogenetic tree. 
Without an ancestral PA sequence, one is unable to determine 
which PA phenotypes are ancestral and which are derived. 
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